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ABSTRACT 

A  collaborative  convex  framework  for  factoring  a  data  matrix  X  into 
a  non-negative  product  AS,  with  a  sparse  coefficient  matrix  S,  is 
introduced.  We  restrict  the  columns  of  the  dictionary  matrix  A  to 
coincide  with  certain  columns  of  X,  thereby  guaranteeing  a  phys¬ 
ically  meaningful  dictionary  and  dimensionality  reduction.  As  an 
example,  we  show  applications  of  the  proposed  framework  on  hy- 
perspectral  endmember  and  abundances  identification. 

Index  Terms —  Hyperspectral  endmember  detection,  non¬ 
negative  matrix  factorization,  dictionary  learning,  subset  selection, 
dimensionality  reduction 

1.  INTRODUCTION 

Dimensionality  reduction  has  been  widely  studied  in  the  signal  pro¬ 
cessing  and  computational  learning  communities  in  recent  years. 
One  of  the  major  drawbacks  of  virtually  all  popular  approaches  for 
dimensionality  reduction  is  the  lack  of  physical  meaning  in  the  re¬ 
duced  dimension  space.  This  significantly  reduces  the  applicability 
of  such  methods.  In  this  work  we  present  a  framework  for  dimen¬ 
sionality  reduction,  based  on  matrix  factorization  and  sparsity  the¬ 
ory,  that  uses  the  data  itself  for  the  low  dimensional  representation, 
thereby  guaranteeing  the  physical  fidelity.  While  the  method  is  ap¬ 
plicable  in  numerous  areas,  from  biology  to  sensor  networks,  we 
concentrate  on  an  application  in  hyperspectral  imaging  (HSI).  This 
by  itself  is  an  important  area  that  will  help  to  illustrate  the  proposed 
framework  and  demonstrate  it  in  real  data.  Applications  in  other 
disciplines,  where  the  physical  meaning  of  the  low  dimensional  rep¬ 
resentation  is  also  critical,  will  be  studied  in  the  future. 

HSI  sensors  record  up  to  several  hundred  different  frequencies  in 
the  visible,  near-infrared  and  infrared  spectrum.  This  precise  spec¬ 
tral  information  provides  some  insight  on  the  material  at  each  pixel 
in  the  image.  Due  to  relatively  low  spatial  resolution  and  the  pres¬ 
ence  of  multiple  materials  at  a  single  location  (e.g.,  tree  canopies 
above  ground  or  water  and  water  sediments),  many  pixels  in  the  im¬ 
age  contain  the  mixed  spectral  signatures  of  multiple  materials.  The 
task  of  determining  the  abundances  (presence  quantity)  of  different 
materials  in  each  pixel  is  called  spectral  unmixing.  This  is  clearly  an 
ill-posed  problem  that  requires  some  assumptions  and  data  models. 
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Unmixing  requires  a  dictionary  with  the  spectral  signatures  of 
the  possible  materials  (often  denoted  as  endmembers )  in  the  image. 
Since  these  dictionaries  can  be  difficult  to  obtain  and  might  depend 
on  the  conditions  under  which  they  were  recorded,  it  is  sometimes 
desirable  to  automatically  extract  suitable  endmembers  from  the  im¬ 
age  one  wants  to  demix  in  a  process  called  endmember  detection. 
Many  different  techniques  for  endmember  detection  have  been  pro¬ 
posed,  see  [1]  and  references  therein.  Related  although  not  yet  ap¬ 
plied  to  endmember  detection  are  subset  selection  methods  like  the 
rank-revealing  QR  decomposition  (e.g.  [2]),  which  finds  the  most 
linearly  independent  columns  of  a  matrix.1 

Simultaneously  detecting  endmembers  and  computing  abun¬ 
dances  can  be  stated  as  factorizing  the  data  matrix  X  £  Rm,d  into 
A'  «  AS,  A,S>  0,  with  both  A  £  Rm’"  and  S  £  R"’d  being 
unknown.  In  this  notation  each  column  of  A'  is  the  spectral  signa¬ 
ture  of  one  pixel  in  the  image.  Hence,  m  is  the  number  of  spectral 
bands,  d  is  the  total  number  of  pixels,  and  each  of  the  n  columns  of 
A  represents  one  endmember.  The  abundance  matrix  S  contains  the 
amounts  of  each  material  in  A  at  each  pixel  in  X.  Due  to  the  phys¬ 
ical  non-negativity  constraint  the  geometric  interpretation  of  blind 
unmixing  is  to  find  endmembers  that  span  a  cone  which  contains 
most  of  the  data. 

The  general  problem  of  representing  X  «  AS  with  A,  S  >  0  is 
known  as  non-negative  matrix  factorization  (NMF).  The  application 
of  NMF  to  hyperspectral  endmember  detection  can  for  instance  be 
found  in  [3],  NMF  problems  are  non-convex  and  typically  solved  by 
estimating  A  and  S  altematingly.  Although  variants  of  NMF  meth¬ 
ods  often  produce  good  results  in  practice,  they  are  not  guaranteed 
to  converge  to  a  global  minimum. 

Considering  that  while  material  mixtures  in  HSI  exist,  it  is  un¬ 
likely  that  pixels  contain  a  mixture  of  all  or  many  of  the  materials  in 
A,  researchers  have  recently  focused  on  encouraging  sparsity  on  the 
abundance  matrix  S  [4],  Motivated  by  the  ideas  of  dictionary  learn¬ 
ing  for  sparse  coding,  the  works  [5,  6]  proposed  explicitly  to  look  for 
endmember  matrices  that  lead  to  sparse  abundance  maps.  We  follow 
a  similar  idea,  though  our  method  will  be  fundamentally  different  in 
two  aspects:  First,  we  restrict  columns  of  our  dictionary  A  to  appear 
somewhere  in  the  data  X.  This  is  a  common  working  hypothesis 
for  moderate  ground  sampling  distance  images  and  is  called  pixel 
purity  assumption.  In  the  general  context  of  dictionary  learning  and 


'independent  of  the  work  here  described.  Laura  Balzano  and  Rob  Nowak 
developed  a  related  matrix  factorization  technique  and  connected  and  com¬ 
pared  it  to  RRQR.  We  thank  Laura  for  pointing  out  their  work  and  also  the 
possible  relationships  with  RRQR. 


non-negative  matrix  factorization  it  guarantees  the  columns  of  A  to 
be  physically  meaningful.  As  mentioned  above,  the  lack  of  physi¬ 
cal  interpretation  has  been  a  critical  shortcoming  of  standard  dimen¬ 
sionality  reduction  and  dictionary  learning  techniques,  and  not  yet 
addressed  in  these  areas  of  research.  Second,  choosing  the  dictio¬ 
nary  columns  from  the  data  enables  us  to  propose  a  convex  model 
and  hence  avoid  the  problem  of  saddle  points  or  local  minima. 

Our  method  is  based  on  the  recent  ideas  of  collaborative  spar¬ 
sity  (see  for  example  [7]  and  the  references  therein)  and  will  use  an 
energy  that  extends  the  one  studied  in  [8],  in  a  similar  setting  as  re¬ 
cently  proposed  by  Lin  et  al.  in  [9]  for  low  rank  approximation  with 
the  nuclear  norm. 

2.  CONVEX  ENDMEMBER  DETECTION  MODEL 

With  the  restriction  that  the  endmembers  in  A  be  part  of  the  data 
X,  the  underlying  problem  is  then  to  find  a  nonnegative  matrix,  T, 
such  that  XT  fts  X,  most  rows  of  T  are  zero  and  the  columns  of 
T  are  additionally  sparse.  In  other  words,  we  want  the  columns  of 
X,  which  are  the  spectral  signatures  at  each  pixel,  to  be  well  repre¬ 
sented  by  sparse  nonnegative  linear  combinations  of  the  same  sparse 
subset  of  columns  of  A.  This  subset  of  columns  corresponds  to  the 
nonzero  rows  of  T  and  will  be  the  selected  endmembers.  The  ad¬ 
ditional  sparsity  requirement  on  T  reflects  the  assumption  that  most 
pixels  are  not  mixtures  of  all  the  selected  endmembers,  but  rather 
just  a  few.  We  normalize  the  columns  of  A'  to  have  unit  I2  norm 
so  that  we  discriminate  based  solely  on  spectral  signatures  and  not 
intensity. 

It’s  already  clear  from  this  formulation  that  the  problem  is  too 
large  because  the  unknown  matrix  T  isadx  d  matrix,  where  d  is  the 
number  of  pixels.  Thus,  before  proceeding  with  the  proposed  convex 
model,  we  reduce  the  dimension  of  the  problem  by  using  clustering 
to  choose  a  subset  of  candidate  endmembers  A  from  the  columns  of 
X  and  a  submatrix  X3  £  RmXds  y  for  the  data  with  da  <  d. 
We  use  Xs  =  A  in  our  experiments  but  could  also  include  more  or 
even  all  of  the  data.  We  use  k-means  with  a  farthest-first  initializa¬ 
tion  to  select  A.  An  angle  constraint  (A;,  Aj )  <  .995  ensures  the 
endmember  candidates  are  sufficiently  distinct,  and  an  upper  bound 
nc  is  placed  on  the  number  of  allowable  clusters  so  that  T  is  at  most 
a  nc  x  da  matrix  and  the  size  of  the  problem  is  reasonable.  We  then 
propose  a  convex  model  for  the  more  manageable  problem  of  finding 
a  nonnegative  T  such  that  AT  «  Xs,  with  T  having  the  same  spar¬ 
sity  properties  described  above.  Note  that  we  have  not  convexified 
the  problem  by  pre-fixing  the  dictionary  A.  This  is  done  simply  to 
work  with  manageable  dimensions  and  datasets.  Our  convex  model 
will  still  select  the  endmembers  as  a  subset  of  this  reduced  A. 

2.1.  The  model 

Our  model  consists  of  a  data  fidelity  term  and  two  terms  that  encour¬ 
age  the  desired  sparsity  in  T.  For  simplicity,  we  consider  the  data 
fidelity  term  1 1|  (AT— Xs)\/Cw\\%,  where  ||  ■  ||f  denotes  the  Frobe- 
nius  norm,  /3  is  a  positive  constant,  and  Cw  £  R“s  x  ds  is  a  diagonal 
matrix  we  can  use  to  weight  the  columns  of  (AT  —  Xs)  so  that  it  re¬ 
flects  the  density  of  the  original  data.  So  that  only  a  few  samples  are 
cooperatively  selected  as  endmembers,  we  encourage  rows  of  T  to 
be  zero  by  penalizing  ( (TL  |T(  with  £  a  positive  constant.  Due  to 
the  nonnegativity  constraint,  this  is  the  same  as  (  JT  max,  (T(iJ).2 
This  kind  of  collaborative/structured  sparsity  regularizer  has  been 

2The  work  mentioned  above  by  Balzano  and  Nowak  uses  ||  •  || 2  instead 
of  1 1  •  1 1 00 • 


proposed  in  several  previous  works,  for  example  [8].  To  encour¬ 
age  sparsity  of  the  nonnegative  T,  we  use  a  weighted  li  norm, 
(RwaCw,T) .  Here  Rw  is  a  diagonal  matrix  of  row  weights.  We 
choose  Rw  to  be  the  identity  in  our  experiments,  but  we  could 
also  choose  these  weights  to  be  proportional  to  (Aj,  A),  where  A 
is  the  average  of  the  columns  of  A,  which  would  encourage  selec¬ 
tion  of  endmembers  towards  the  outside  of  the  cone  containing  the 
data.  The  weighting  matrix  a  has  the  same  dimension  as  T,  and  the 

-(l-(ATXa)iJ)x 

weights  are  chosen  to  be  <7;  j  =  i/(1  —  e  TP  )  for  con¬ 

stants  h  and  v.  This  means  that  cr;,j  is  small  when  the  ith  column 
of  A  is  similar  to  the  jth  column  of  Xa  and  larger  when  they  are 
dissimilar.  This  choice  of  weights  encourages  sparsity  of  T  without 
impeding  the  effectiveness  of  the  other  regularizer.  By  encouraging 
each  column  of  T  to  sum  to  something  closer  to  one.  the  weighted 
l\  penalty  prefers  data  to  be  represented  by  nearby  endmember  can¬ 
didates  when  possible,  and  this  often  results  in  a  sparser  matrix  T. 
Overall  the  proposed  convex  model  is  given  by 

minC  V'max(Tiij)  +  (RwaCw,T)  +  ^IKAT  -  XS)VC^\\%. 

T>0  j  A 

x 

(i) 


2.2.  Refinement  of  solution 

Since  we  are  using  a  convex  model  to  detect  endmembers,  it  can’t 
distinguish  between  identical  or  very  similar  endmember  candidates. 
But  the  model  works  very  reliably  when  the  columns  of  A  are  suf¬ 
ficiently  distinct,  which  they  are  by  construction.  A  limitation  is 
that  the  convex  model  is  unable  to  choose  as  endmembers  any  data 
not  represented  in  A.  Nonetheless,  as  is  shown  in  Section  3,  the  re¬ 
sults  of  this  approach  already  compare  favorably  to  other  methods. 
Moreover,  it  provides  an  excellent  initialization  for  the  alternating 
minimization  approach  to  NMF,  which  can  be  used  to  further  refine 
the  solution  if  desired.  Letting  A  be  the  endmembers  selected  by  the 
convex  model,  the  solution  is  refined  by  alternately  minimizing 

min  J||A5-A|||  +  (J?wa,5>  (2) 

A>0,S>0,|  [  Aj  —  Aj  ||2<r\j  2 

and  renormalizing  the  columns  of  A  after  each  iteration.  Here,  rj 
is  the  diameter  of  the  j th  cluster  containing  the  data  near  Aj,  and 
ensures  that  the  refined  endmembers  obtained  by  this  alternating  ap¬ 
proach  cannot  be  too  different  from  those  already  selected  by  the 
convex  model,  thereby  remaining  close  to  the  physical  space. 

2.3.  Numerical  optimization 

We  use  the  alternating  direction  method  of  multipliers  (ADMM)  [10, 
11]  to  solve  (1)  using  the  parameters  nc  —  150,  (  =  1,  /3  =  250, 
v  =  50,  and  h  =  1  —  cos(47r/180).  In  our  experiments,  we  also 
choose  Xa  =  A.  We  then  define  column  weights  Cw  that  weight 
each  column  j  by  the  number  of  pixels  in  the  j  th  cluster  (the  cluster 
centered  at  Aj)  divided  by  the  total  number  of  pixels  d.  To  refine  the 
solution  of  the  convex  model,  we  note  that  each  alternating  step  in 
the  minimization  of  (2)  is  a  convex  minimization  problem  that  can 
again  be  minimized  using  ADMM  and  its  variants.  The  update  for 
the  abundance  S  is  identical  to  the  split  Bregman  algorithm  proposed 
for  hyperspectral  demixing  in  [4],  and  its  connection  to  ADMM  is 
discussed  in  [12], 


Fig.  1:  Comparison  of  endmember  recostruction  methods 


3.  NUMERICAL  RESULTS 
3.1.  Supervised  endmember  detection 

For  comparison  purposes  we  extracted  nine  endnrembers 
from  the  standard  Indian  pines  dataset  (publicly  available 
at  https : // engineering . purdue . edu/biehl/ 

MultiSpec/hyperspectral .  html)  by  averaging  over 
the  corresponding  signals  in  the  ground  truth  region.  Then  we 
created  50  data  points  for  each  endmember,  30  data  points  for  each 
combination  of  two  different,  10  data  points  for  each  combination 
of  three  different,  and  additionally  30  data  points  as  mixtures  of 
all  endmembers.  Finally,  we  add  Gaussian  noise  with  zero  mean 
and  standard  deviation  0.006,  make  sure  our  data  is  positive,  and 
normalize  it.  We  evaluate  our  method  in  a  comparison  to  N-findr 
[13],  vertex  component  analysis  (VCA)  [14]  with  code  from  [15],  an 
NMF  method  using  the  alternating  minimization  scheme  of  our  re¬ 
finement  step  with  random  initial  conditions,  and  the  QR  algorithm. 
For  the  latter  we  simply  used  MATLAB’s  QR  algorithm  to  calculate 
a  permutation  matrix  If  such  that  XII  =  QR  with  decreasing 
diagonal  values  in  R  and  chose  the  first  nine  columns  of  A' II  as 
endmembers.  Since  the  success  of  non-convex  methods  depends 
on  the  initial  conditions  or  on  random  projections  we  run  15  tests 
with  the  same  general  settings  and  record  the  average,  maximum 
and  minimum  angle  by  which  the  reconstructed  endmember  vectors 
deviate  from  the  true  ones  in  Table  1.  For  the  tests  we  adjusted  the 
parameters  of  our  method  to  obtain  9  endmembers. 

We  can  see  that  our  method  gives  the  best  average  performance. 
Due  to  a  high  noise  level,  methods  that  rely  on  finding  the  cone  with 
maximal  volume  or  finding  most  linearly  independent  vectors,  will 
select  outliers  as  the  endmembers  and  do  not  yield  robust  results. 
Looking  at  the  minimal  and  maximal  a  we  see  the  effect  predicted. 
The  non-convex  methods  like  alternating  minimization  and  VCA  can 
outperform  our  method  on  some  examples  giving  angles  as  low  as 
1.76.  However,  due  to  the  non-convexity  they  can  sometimes  find 
results  which  are  far  off  the  true  solution  with  angles  of  6.95  or  even 


8.17  degrees.  A  big  benefit  of  our  convex  framework  is  that  we  con¬ 
sistently  produce  good  results.  The  difference  between  the  best  and 
the  worst  reconstruction  angle  for  our  method  is  0.15  degrees  with 
and  0.17  without  the  refinement,  which  underlines  its  high  stabil¬ 
ity.  Figure  1  shows  the  original  endmembers  as  well  as  an  example 
reconstructions  by  each  method  with  the  corresponding  angle  of  de¬ 
viation. 


Method 

Evaluation  on  15  test  runs 

Avg.  a 

Min.  a 

Max.  a 

Ours  refined 

3.37 

3.30 

3.42 

Ours  without  refinement 

3.93 

3.84 

4.01 

VCA 

4.76 

1.78 

6.95 

N-findr 

10.19 

7.12 

13.79 

QR 

9.87 

4.71 

12.74 

Alt.  Min. 

4.50 

1.76 

8.17 

Table  1:  Angle  of  deviation  from  true  endmembers 

3.2.  Results  on  real  hyperspectral  data 

To  show  how  our  method  performs  on  real  hyperspectral  data  we 
use  the  urban  image  (publicly  available  at  www .  tec .  army.mil/ 
hypercube).  Figure  2  shows  the  RGB  image  of  the  urban  scene, 
the  spectral  signatures  of  the  endmembers  our  method  extracted,  and 
the  abundance  maps  of  each  endmember.  We  can  see  that  we  get 
a  very  sparse  image  representation,  segmenting  into  material  cat¬ 
egories  such  as  concrete,  house  roofs,  soil  or  dirt,  grass,  and  two 
different  types  of  vegetation,  which  all  seem  to  be  reasonable  when 
visually  comparing  our  results  to  the  RGB  image. 

4.  FUTURE  RESEARCH 

We  have  presented  a  convex  method  for  endmember  detection  in  hy¬ 
perspectral  images.  However,  the  general  framework  can  be  applied 


Fig.  2:  Results  on  the  urban  hyperpectral  image 


to  a  much  wider  class  of  problems.  For  any  non-negative  matrix  fac¬ 
torization  or  dictionary  learning  method,  it  could  be  interesting  to 
find  physically  meaningful  dictionary  atoms  by  restricting  the  atoms 
to  appear  in  the  data.  Flence,  one  could  develop  a  concept  of  non¬ 
negative  principal  component  analysis  restricting  the  principal  com¬ 
ponents  to  be  part  of  the  data.  Possible  future  application  areas  in¬ 
clude  computational  biology,  sensor  networks,  and  in  general  dimen¬ 
sionality  reduction  and  compact  representation  applications  where 
the  physical  interpretation  of  the  reduced  space  is  critical. 
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